Overview

Dataset statistics

 Dataset ADataset B
Number of variables1212
Number of observations446446
Missing cells427412
Missing cells (%)8.0%7.7%
Duplicate rows00
Duplicate rows (%)0.0%0.0%
Total size in memory45.3 KiB45.3 KiB
Average record size in memory104.0 B104.0 B

Variable types

 Dataset ADataset B
Numeric55
Categorical44
Text33

Alerts

Dataset ADataset B
Age has 95 (21.3%) missing values Age has 70 (15.7%) missing values Missing
Cabin has 331 (74.2%) missing values Cabin has 342 (76.7%) missing values Missing
PassengerId has unique values PassengerId has unique values Unique
Name has unique values Name has unique values Unique
SibSp has 305 (68.4%) zeros SibSp has 317 (71.1%) zeros Zeros
Parch has 341 (76.5%) zeros Parch has 347 (77.8%) zeros Zeros
Fare has 9 (2.0%) zeros Fare has 8 (1.8%) zeros Zeros

Reproduction

 Dataset ADataset B
Analysis started2024-05-07 15:12:04.5854842024-05-07 15:12:08.542699
Analysis finished2024-05-07 15:12:08.5402922024-05-07 15:12:12.511061
Duration3.95 seconds3.97 seconds
Software versionydata-profiling v0.0.dev0ydata-profiling v0.0.dev0
Download configurationconfig.jsonconfig.json

Variables

PassengerId
Real number (ℝ)

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean439.287452.46188
 Dataset ADataset B
Minimum32
Maximum885891
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-05-07T15:12:12.796410image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum32
5-th percentile44.558.25
Q1216.25222.25
median440448
Q3655.5682.5
95-th percentile844.5843.75
Maximum885891
Range882889
Interquartile range (IQR)439.25460.25

Descriptive statistics

 Dataset ADataset B
Standard deviation256.18304260.65243
Coefficient of variation (CV)0.583179210.57607601
Kurtosis-1.1607818-1.2761312
Mean439.287452.46188
Median Absolute Deviation (MAD)218230
Skewness0.028394703-0.02010083
Sum195922201798
Variance65629.75167939.692
MonotonicityNot monotonicNot monotonic
2024-05-07T15:12:13.064920image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
167 1
 
0.2%
125 1
 
0.2%
132 1
 
0.2%
556 1
 
0.2%
117 1
 
0.2%
605 1
 
0.2%
719 1
 
0.2%
74 1
 
0.2%
286 1
 
0.2%
52 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
823 1
 
0.2%
337 1
 
0.2%
884 1
 
0.2%
677 1
 
0.2%
702 1
 
0.2%
492 1
 
0.2%
123 1
 
0.2%
635 1
 
0.2%
891 1
 
0.2%
405 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
3 1
0.2%
4 1
0.2%
7 1
0.2%
9 1
0.2%
11 1
0.2%
12 1
0.2%
13 1
0.2%
15 1
0.2%
18 1
0.2%
20 1
0.2%
ValueCountFrequency (%)
2 1
0.2%
3 1
0.2%
4 1
0.2%
8 1
0.2%
13 1
0.2%
14 1
0.2%
22 1
0.2%
23 1
0.2%
27 1
0.2%
29 1
0.2%
ValueCountFrequency (%)
2 1
0.2%
3 1
0.2%
4 1
0.2%
8 1
0.2%
13 1
0.2%
14 1
0.2%
22 1
0.2%
23 1
0.2%
27 1
0.2%
29 1
0.2%
ValueCountFrequency (%)
3 1
0.2%
4 1
0.2%
7 1
0.2%
9 1
0.2%
11 1
0.2%
12 1
0.2%
13 1
0.2%
15 1
0.2%
18 1
0.2%
20 1
0.2%

Survived
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
0
264 
1
182 
0
278 
1
168 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters22
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row10
2nd row00
3rd row01
4th row10
5th row00

Common Values

ValueCountFrequency (%)
0 264
59.2%
1 182
40.8%
ValueCountFrequency (%)
0 278
62.3%
1 168
37.7%

Length

2024-05-07T15:12:13.266369image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-05-07T15:12:13.528501image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T15:12:13.662969image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
ValueCountFrequency (%)
0 264
59.2%
1 182
40.8%
ValueCountFrequency (%)
0 278
62.3%
1 168
37.7%

Most occurring characters

ValueCountFrequency (%)
0 264
59.2%
1 182
40.8%
ValueCountFrequency (%)
0 278
62.3%
1 168
37.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 264
59.2%
1 182
40.8%
ValueCountFrequency (%)
0 278
62.3%
1 168
37.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 264
59.2%
1 182
40.8%
ValueCountFrequency (%)
0 278
62.3%
1 168
37.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 264
59.2%
1 182
40.8%
ValueCountFrequency (%)
0 278
62.3%
1 168
37.7%

Pclass
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
3
244 
1
121 
2
81 
3
250 
1
112 
2
84 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row11
2nd row32
3rd row33
4th row23
5th row33

Common Values

ValueCountFrequency (%)
3 244
54.7%
1 121
27.1%
2 81
 
18.2%
ValueCountFrequency (%)
3 250
56.1%
1 112
25.1%
2 84
 
18.8%

Length

2024-05-07T15:12:13.809257image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-05-07T15:12:13.953982image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T15:12:14.102596image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
ValueCountFrequency (%)
3 244
54.7%
1 121
27.1%
2 81
 
18.2%
ValueCountFrequency (%)
3 250
56.1%
1 112
25.1%
2 84
 
18.8%

Most occurring characters

ValueCountFrequency (%)
3 244
54.7%
1 121
27.1%
2 81
 
18.2%
ValueCountFrequency (%)
3 250
56.1%
1 112
25.1%
2 84
 
18.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
3 244
54.7%
1 121
27.1%
2 81
 
18.2%
ValueCountFrequency (%)
3 250
56.1%
1 112
25.1%
2 84
 
18.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
3 244
54.7%
1 121
27.1%
2 81
 
18.2%
ValueCountFrequency (%)
3 250
56.1%
1 112
25.1%
2 84
 
18.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
3 244
54.7%
1 121
27.1%
2 81
 
18.2%
ValueCountFrequency (%)
3 250
56.1%
1 112
25.1%
2 84
 
18.8%

Name
['Text', 'Text']

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-05-07T15:12:14.595112image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

 Dataset ADataset B
Max length8261
Median length4849
Mean length27.32062827.026906
Min length1512

Characters and Unicode

 Dataset ADataset B
Total characters1218512054
Distinct characters5959
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique446446 ?
Unique (%)100.0%100.0%

Sample

 Dataset ADataset B
1st rowChibnall, Mrs. (Edith Martha Bowerman)Reuchlin, Jonkheer. John George
2nd rowJussila, Miss. Mari AinaCunningham, Mr. Alfred Fleming
3rd rowSoholt, Mr. Peter Andreas Lauritz AndersenChip, Mr. Chang
4th rowHart, Miss. Eva MiriamCoelho, Mr. Domingos Fernandeo
5th rowHagland, Mr. Ingvald Olai OlsenSirota, Mr. Maurice
ValueCountFrequency (%)
mr 256
 
14.0%
miss 92
 
5.0%
mrs 66
 
3.6%
william 27
 
1.5%
john 22
 
1.2%
master 20
 
1.1%
george 14
 
0.8%
james 14
 
0.8%
henry 14
 
0.8%
thomas 13
 
0.7%
Other values (926) 1291
70.6%
ValueCountFrequency (%)
mr 267
 
14.7%
miss 92
 
5.1%
mrs 58
 
3.2%
william 31
 
1.7%
john 20
 
1.1%
henry 20
 
1.1%
master 19
 
1.0%
james 13
 
0.7%
edward 12
 
0.7%
thomas 11
 
0.6%
Other values (912) 1273
70.1%
2024-05-07T15:12:15.393197image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1383
 
11.4%
r 986
 
8.1%
a 847
 
7.0%
e 833
 
6.8%
i 695
 
5.7%
n 660
 
5.4%
s 659
 
5.4%
M 572
 
4.7%
l 522
 
4.3%
o 503
 
4.1%
Other values (49) 4525
37.1%
ValueCountFrequency (%)
1371
 
11.4%
r 992
 
8.2%
e 883
 
7.3%
a 812
 
6.7%
i 683
 
5.7%
n 649
 
5.4%
s 624
 
5.2%
M 557
 
4.6%
l 541
 
4.5%
o 486
 
4.0%
Other values (49) 4456
37.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 12185
100.0%
ValueCountFrequency (%)
(unknown) 12054
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1383
 
11.4%
r 986
 
8.1%
a 847
 
7.0%
e 833
 
6.8%
i 695
 
5.7%
n 660
 
5.4%
s 659
 
5.4%
M 572
 
4.7%
l 522
 
4.3%
o 503
 
4.1%
Other values (49) 4525
37.1%
ValueCountFrequency (%)
1371
 
11.4%
r 992
 
8.2%
e 883
 
7.3%
a 812
 
6.7%
i 683
 
5.7%
n 649
 
5.4%
s 624
 
5.2%
M 557
 
4.6%
l 541
 
4.5%
o 486
 
4.0%
Other values (49) 4456
37.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 12185
100.0%
ValueCountFrequency (%)
(unknown) 12054
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1383
 
11.4%
r 986
 
8.1%
a 847
 
7.0%
e 833
 
6.8%
i 695
 
5.7%
n 660
 
5.4%
s 659
 
5.4%
M 572
 
4.7%
l 522
 
4.3%
o 503
 
4.1%
Other values (49) 4525
37.1%
ValueCountFrequency (%)
1371
 
11.4%
r 992
 
8.2%
e 883
 
7.3%
a 812
 
6.7%
i 683
 
5.7%
n 649
 
5.4%
s 624
 
5.2%
M 557
 
4.6%
l 541
 
4.5%
o 486
 
4.0%
Other values (49) 4456
37.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 12185
100.0%
ValueCountFrequency (%)
(unknown) 12054
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1383
 
11.4%
r 986
 
8.1%
a 847
 
7.0%
e 833
 
6.8%
i 695
 
5.7%
n 660
 
5.4%
s 659
 
5.4%
M 572
 
4.7%
l 522
 
4.3%
o 503
 
4.1%
Other values (49) 4525
37.1%
ValueCountFrequency (%)
1371
 
11.4%
r 992
 
8.2%
e 883
 
7.3%
a 812
 
6.7%
i 683
 
5.7%
n 649
 
5.4%
s 624
 
5.2%
M 557
 
4.6%
l 541
 
4.5%
o 486
 
4.0%
Other values (49) 4456
37.0%

Sex
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
male
285 
female
161 
male
293 
female
153 

Length

 Dataset ADataset B
Max length66
Median length44
Mean length4.72197314.6860987
Min length44

Characters and Unicode

 Dataset ADataset B
Total characters21062090
Distinct characters55
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowfemalemale
2nd rowfemalemale
3rd rowmalemale
4th rowfemalemale
5th rowmalemale

Common Values

ValueCountFrequency (%)
male 285
63.9%
female 161
36.1%
ValueCountFrequency (%)
male 293
65.7%
female 153
34.3%

Length

2024-05-07T15:12:15.574234image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-05-07T15:12:15.738218image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T15:12:15.872965image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
ValueCountFrequency (%)
male 285
63.9%
female 161
36.1%
ValueCountFrequency (%)
male 293
65.7%
female 153
34.3%

Most occurring characters

ValueCountFrequency (%)
e 607
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 161
 
7.6%
ValueCountFrequency (%)
e 599
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 153
 
7.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2106
100.0%
ValueCountFrequency (%)
(unknown) 2090
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 607
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 161
 
7.6%
ValueCountFrequency (%)
e 599
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 153
 
7.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2106
100.0%
ValueCountFrequency (%)
(unknown) 2090
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 607
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 161
 
7.6%
ValueCountFrequency (%)
e 599
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 153
 
7.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2106
100.0%
ValueCountFrequency (%)
(unknown) 2090
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 607
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 161
 
7.6%
ValueCountFrequency (%)
e 599
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 153
 
7.3%

Age
Real number (ℝ)

 Dataset ADataset B
Distinct7373
Distinct (%)20.8%19.4%
Missing9570
Missing (%)21.3%15.7%
Infinite00
Infinite (%)0.0%0.0%
Mean29.53444429.184628
 Dataset ADataset B
Minimum0.420.92
Maximum7180
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-05-07T15:12:16.088460image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum0.420.92
5-th percentile45
Q12120
median2828
Q33836
95-th percentile55.554
Maximum7180
Range70.5879.08
Interquartile range (IQR)1716

Descriptive statistics

 Dataset ADataset B
Standard deviation14.3863814.008752
Coefficient of variation (CV)0.487105150.48000446
Kurtosis0.129855170.51525078
Mean29.53444429.184628
Median Absolute Deviation (MAD)88
Skewness0.343316070.46383981
Sum10366.5910973.42
Variance206.96793196.24512
MonotonicityNot monotonicNot monotonic
2024-05-07T15:12:16.371917image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
28 16
 
3.6%
30 15
 
3.4%
24 14
 
3.1%
19 13
 
2.9%
21 13
 
2.9%
33 12
 
2.7%
22 12
 
2.7%
29 11
 
2.5%
36 11
 
2.5%
18 11
 
2.5%
Other values (63) 223
50.0%
(Missing) 95
21.3%
ValueCountFrequency (%)
29 18
 
4.0%
22 17
 
3.8%
24 17
 
3.8%
32 14
 
3.1%
18 14
 
3.1%
19 14
 
3.1%
21 13
 
2.9%
28 12
 
2.7%
36 12
 
2.7%
34 11
 
2.5%
Other values (63) 234
52.5%
(Missing) 70
 
15.7%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.67 1
 
0.2%
0.75 2
 
0.4%
1 3
0.7%
2 4
0.9%
3 5
1.1%
4 5
1.1%
5 3
0.7%
6 1
 
0.2%
7 1
 
0.2%
ValueCountFrequency (%)
0.92 1
 
0.2%
1 3
 
0.7%
2 8
1.8%
3 3
 
0.7%
4 3
 
0.7%
5 3
 
0.7%
6 2
 
0.4%
7 3
 
0.7%
8 2
 
0.4%
9 4
0.9%
ValueCountFrequency (%)
0.92 1
 
0.2%
1 3
 
0.7%
2 8
1.8%
3 3
 
0.7%
4 3
 
0.7%
5 3
 
0.7%
6 2
 
0.4%
7 3
 
0.7%
8 2
 
0.4%
9 4
0.9%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.67 1
 
0.2%
0.75 2
 
0.4%
1 3
0.7%
2 4
0.9%
3 5
1.1%
4 5
1.1%
5 3
0.7%
6 1
 
0.2%
7 1
 
0.2%

SibSp
Real number (ℝ)

 Dataset ADataset B
Distinct77
Distinct (%)1.6%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.482062780.48654709
 Dataset ADataset B
Minimum00
Maximum88
Zeros305317
Zeros (%)68.4%71.1%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-05-07T15:12:16.578468image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q311
95-th percentile22
Maximum88
Range88
Interquartile range (IQR)11

Descriptive statistics

 Dataset ADataset B
Standard deviation0.984550891.0927949
Coefficient of variation (CV)2.04237072.2460208
Kurtosis18.56706617.783857
Mean0.482062780.48654709
Median Absolute Deviation (MAD)00
Skewness3.640633.749945
Sum215217
Variance0.969340451.1942006
MonotonicityNot monotonicNot monotonic
2024-05-07T15:12:16.744004image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 305
68.4%
1 108
 
24.2%
2 15
 
3.4%
4 9
 
2.0%
3 5
 
1.1%
8 2
 
0.4%
5 2
 
0.4%
ValueCountFrequency (%)
0 317
71.1%
1 96
 
21.5%
2 11
 
2.5%
4 10
 
2.2%
3 5
 
1.1%
5 4
 
0.9%
8 3
 
0.7%
ValueCountFrequency (%)
0 305
68.4%
1 108
 
24.2%
2 15
 
3.4%
3 5
 
1.1%
4 9
 
2.0%
5 2
 
0.4%
8 2
 
0.4%
ValueCountFrequency (%)
0 317
71.1%
1 96
 
21.5%
2 11
 
2.5%
3 5
 
1.1%
4 10
 
2.2%
5 4
 
0.9%
8 3
 
0.7%
ValueCountFrequency (%)
0 317
71.1%
1 96
 
21.5%
2 11
 
2.5%
3 5
 
1.1%
4 10
 
2.2%
5 4
 
0.9%
8 3
 
0.7%
ValueCountFrequency (%)
0 305
68.4%
1 108
 
24.2%
2 15
 
3.4%
3 5
 
1.1%
4 9
 
2.0%
5 2
 
0.4%
8 2
 
0.4%

Parch
Real number (ℝ)

 Dataset ADataset B
Distinct76
Distinct (%)1.6%1.3%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.374439460.35426009
 Dataset ADataset B
Minimum00
Maximum65
Zeros341347
Zeros (%)76.5%77.8%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-05-07T15:12:16.898355image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q300
95-th percentile22
Maximum65
Range65
Interquartile range (IQR)00

Descriptive statistics

 Dataset ADataset B
Standard deviation0.791057620.77027196
Coefficient of variation (CV)2.11264492.174312
Kurtosis10.0619178.7473145
Mean0.374439460.35426009
Median Absolute Deviation (MAD)00
Skewness2.73043942.674933
Sum167158
Variance0.625772160.59331889
MonotonicityNot monotonicNot monotonic
2024-05-07T15:12:17.055651image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 341
76.5%
1 57
 
12.8%
2 41
 
9.2%
3 3
 
0.7%
4 2
 
0.4%
5 1
 
0.2%
6 1
 
0.2%
ValueCountFrequency (%)
0 347
77.8%
1 53
 
11.9%
2 40
 
9.0%
4 3
 
0.7%
5 2
 
0.4%
3 1
 
0.2%
ValueCountFrequency (%)
0 341
76.5%
1 57
 
12.8%
2 41
 
9.2%
3 3
 
0.7%
4 2
 
0.4%
5 1
 
0.2%
6 1
 
0.2%
ValueCountFrequency (%)
0 347
77.8%
1 53
 
11.9%
2 40
 
9.0%
3 1
 
0.2%
4 3
 
0.7%
5 2
 
0.4%
ValueCountFrequency (%)
0 347
77.8%
1 53
 
11.9%
2 40
 
9.0%
3 1
 
0.2%
4 3
 
0.7%
5 2
 
0.4%
ValueCountFrequency (%)
0 341
76.5%
1 57
 
12.8%
2 41
 
9.2%
3 3
 
0.7%
4 2
 
0.4%
5 1
 
0.2%
6 1
 
0.2%

Ticket
['Text', 'Text']

 Dataset ADataset B
Distinct380384
Distinct (%)85.2%86.1%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-05-07T15:12:17.666394image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1818
Median length1717
Mean length6.74663686.8049327
Min length33

Characters and Unicode

 Dataset ADataset B
Total characters30093035
Distinct characters3132
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique326339 ?
Unique (%)73.1%76.0%

Sample

 Dataset ADataset B
1st row11350519972
2nd row4137239853
3rd row3481241601
4th rowF.C.C. 13529SOTON/O.Q. 3101307
5th row65303392092
ValueCountFrequency (%)
pc 31
 
5.5%
c.a 11
 
2.0%
2 8
 
1.4%
ston/o 8
 
1.4%
a/5 6
 
1.1%
sc/paris 6
 
1.1%
ca 6
 
1.1%
347082 4
 
0.7%
2666 4
 
0.7%
s.o.c 4
 
0.7%
Other values (399) 476
84.4%
ValueCountFrequency (%)
pc 31
 
5.4%
c.a 13
 
2.3%
ca 8
 
1.4%
ston/o 8
 
1.4%
2 8
 
1.4%
a/5 8
 
1.4%
1601 5
 
0.9%
soton/o.q 5
 
0.9%
w./c 4
 
0.7%
sc/paris 4
 
0.7%
Other values (403) 475
83.5%
2024-05-07T15:12:18.661225image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3 382
12.7%
1 339
11.3%
2 283
9.4%
4 255
8.5%
7 243
8.1%
6 214
 
7.1%
0 197
 
6.5%
5 189
 
6.3%
9 172
 
5.7%
8 133
 
4.4%
Other values (21) 602
20.0%
ValueCountFrequency (%)
3 372
12.3%
1 360
11.9%
2 297
9.8%
7 239
 
7.9%
4 226
 
7.4%
6 207
 
6.8%
0 204
 
6.7%
5 184
 
6.1%
9 181
 
6.0%
8 149
 
4.9%
Other values (22) 616
20.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3009
100.0%
ValueCountFrequency (%)
(unknown) 3035
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
3 382
12.7%
1 339
11.3%
2 283
9.4%
4 255
8.5%
7 243
8.1%
6 214
 
7.1%
0 197
 
6.5%
5 189
 
6.3%
9 172
 
5.7%
8 133
 
4.4%
Other values (21) 602
20.0%
ValueCountFrequency (%)
3 372
12.3%
1 360
11.9%
2 297
9.8%
7 239
 
7.9%
4 226
 
7.4%
6 207
 
6.8%
0 204
 
6.7%
5 184
 
6.1%
9 181
 
6.0%
8 149
 
4.9%
Other values (22) 616
20.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3009
100.0%
ValueCountFrequency (%)
(unknown) 3035
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
3 382
12.7%
1 339
11.3%
2 283
9.4%
4 255
8.5%
7 243
8.1%
6 214
 
7.1%
0 197
 
6.5%
5 189
 
6.3%
9 172
 
5.7%
8 133
 
4.4%
Other values (21) 602
20.0%
ValueCountFrequency (%)
3 372
12.3%
1 360
11.9%
2 297
9.8%
7 239
 
7.9%
4 226
 
7.4%
6 207
 
6.8%
0 204
 
6.7%
5 184
 
6.1%
9 181
 
6.0%
8 149
 
4.9%
Other values (22) 616
20.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3009
100.0%
ValueCountFrequency (%)
(unknown) 3035
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
3 382
12.7%
1 339
11.3%
2 283
9.4%
4 255
8.5%
7 243
8.1%
6 214
 
7.1%
0 197
 
6.5%
5 189
 
6.3%
9 172
 
5.7%
8 133
 
4.4%
Other values (21) 602
20.0%
ValueCountFrequency (%)
3 372
12.3%
1 360
11.9%
2 297
9.8%
7 239
 
7.9%
4 226
 
7.4%
6 207
 
6.8%
0 204
 
6.7%
5 184
 
6.1%
9 181
 
6.0%
8 149
 
4.9%
Other values (22) 616
20.3%

Fare
Real number (ℝ)

 Dataset ADataset B
Distinct187183
Distinct (%)41.9%41.0%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean31.70700631.233865
 Dataset ADataset B
Minimum00
Maximum262.375512.3292
Zeros98
Zeros (%)2.0%1.8%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-05-07T15:12:18.942190image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile7.2257.05105
Q17.89587.925
median14.7513
Q334.2864532.875
95-th percentile105.0590.8094
Maximum262.375512.3292
Range262.375512.3292
Interquartile range (IQR)26.3906524.95

Descriptive statistics

 Dataset ADataset B
Standard deviation41.72449145.850089
Coefficient of variation (CV)1.31593921.4679608
Kurtosis11.0082333.155174
Mean31.70700631.233865
Median Absolute Deviation (MAD)7.24175.775
Skewness3.02799474.6348444
Sum14141.32513930.304
Variance1740.93322102.2307
MonotonicityNot monotonicNot monotonic
2024-05-07T15:12:19.218642image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7.8958 22
 
4.9%
13 22
 
4.9%
7.75 19
 
4.3%
8.05 18
 
4.0%
26 11
 
2.5%
7.225 9
 
2.0%
0 9
 
2.0%
7.925 9
 
2.0%
8.6625 8
 
1.8%
7.775 8
 
1.8%
Other values (177) 311
69.7%
ValueCountFrequency (%)
8.05 26
 
5.8%
13 21
 
4.7%
7.75 18
 
4.0%
7.8958 17
 
3.8%
26 12
 
2.7%
10.5 11
 
2.5%
7.925 11
 
2.5%
7.775 9
 
2.0%
0 8
 
1.8%
8.6625 8
 
1.8%
Other values (173) 305
68.4%
ValueCountFrequency (%)
0 9
2.0%
5 1
 
0.2%
6.45 1
 
0.2%
6.4958 1
 
0.2%
6.75 1
 
0.2%
6.8583 1
 
0.2%
6.95 1
 
0.2%
6.975 1
 
0.2%
7.0458 1
 
0.2%
7.05 2
 
0.4%
ValueCountFrequency (%)
0 8
1.8%
5 1
 
0.2%
6.2375 1
 
0.2%
6.4375 1
 
0.2%
6.45 1
 
0.2%
6.4958 1
 
0.2%
6.75 2
 
0.4%
6.95 1
 
0.2%
6.975 1
 
0.2%
7.0458 1
 
0.2%
ValueCountFrequency (%)
0 8
1.8%
5 1
 
0.2%
6.2375 1
 
0.2%
6.4375 1
 
0.2%
6.45 1
 
0.2%
6.4958 1
 
0.2%
6.75 2
 
0.4%
6.95 1
 
0.2%
6.975 1
 
0.2%
7.0458 1
 
0.2%
ValueCountFrequency (%)
0 9
2.0%
5 1
 
0.2%
6.45 1
 
0.2%
6.4958 1
 
0.2%
6.75 1
 
0.2%
6.8583 1
 
0.2%
6.95 1
 
0.2%
6.975 1
 
0.2%
7.0458 1
 
0.2%
7.05 2
 
0.4%

Cabin
['Text', 'Text']

 Dataset ADataset B
Distinct9481
Distinct (%)81.7%77.9%
Missing331342
Missing (%)74.2%76.7%
Memory size7.0 KiB7.0 KiB
2024-05-07T15:12:19.763170image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1515
Median length33
Mean length3.49565223.6538462
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters402380
Distinct characters1819
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique7461 ?
Unique (%)64.3%58.7%

Sample

 Dataset ADataset B
1st rowE33C106
2nd rowF G73C22 C26
3rd rowC22 C26B35
4th rowB102F33
5th rowE101C2
ValueCountFrequency (%)
g6 3
 
2.3%
e8 2
 
1.5%
d36 2
 
1.5%
b98 2
 
1.5%
b96 2
 
1.5%
c126 2
 
1.5%
d35 2
 
1.5%
e67 2
 
1.5%
c78 2
 
1.5%
b60 2
 
1.5%
Other values (95) 110
84.0%
ValueCountFrequency (%)
g6 3
 
2.4%
b96 3
 
2.4%
b98 3
 
2.4%
f33 3
 
2.4%
d33 2
 
1.6%
b22 2
 
1.6%
c26 2
 
1.6%
c22 2
 
1.6%
g73 2
 
1.6%
f 2
 
1.6%
Other values (84) 100
80.6%
2024-05-07T15:12:20.514049image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
B 41
 
10.2%
6 38
 
9.5%
3 34
 
8.5%
1 33
 
8.2%
C 33
 
8.2%
2 33
 
8.2%
5 24
 
6.0%
7 19
 
4.7%
E 19
 
4.7%
4 19
 
4.7%
Other values (8) 109
27.1%
ValueCountFrequency (%)
B 44
11.6%
2 40
10.5%
3 31
 
8.2%
1 31
 
8.2%
C 30
 
7.9%
5 24
 
6.3%
8 22
 
5.8%
6 22
 
5.8%
20
 
5.3%
D 20
 
5.3%
Other values (9) 96
25.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 402
100.0%
ValueCountFrequency (%)
(unknown) 380
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
B 41
 
10.2%
6 38
 
9.5%
3 34
 
8.5%
1 33
 
8.2%
C 33
 
8.2%
2 33
 
8.2%
5 24
 
6.0%
7 19
 
4.7%
E 19
 
4.7%
4 19
 
4.7%
Other values (8) 109
27.1%
ValueCountFrequency (%)
B 44
11.6%
2 40
10.5%
3 31
 
8.2%
1 31
 
8.2%
C 30
 
7.9%
5 24
 
6.3%
8 22
 
5.8%
6 22
 
5.8%
20
 
5.3%
D 20
 
5.3%
Other values (9) 96
25.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 402
100.0%
ValueCountFrequency (%)
(unknown) 380
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
B 41
 
10.2%
6 38
 
9.5%
3 34
 
8.5%
1 33
 
8.2%
C 33
 
8.2%
2 33
 
8.2%
5 24
 
6.0%
7 19
 
4.7%
E 19
 
4.7%
4 19
 
4.7%
Other values (8) 109
27.1%
ValueCountFrequency (%)
B 44
11.6%
2 40
10.5%
3 31
 
8.2%
1 31
 
8.2%
C 30
 
7.9%
5 24
 
6.3%
8 22
 
5.8%
6 22
 
5.8%
20
 
5.3%
D 20
 
5.3%
Other values (9) 96
25.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 402
100.0%
ValueCountFrequency (%)
(unknown) 380
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
B 41
 
10.2%
6 38
 
9.5%
3 34
 
8.5%
1 33
 
8.2%
C 33
 
8.2%
2 33
 
8.2%
5 24
 
6.0%
7 19
 
4.7%
E 19
 
4.7%
4 19
 
4.7%
Other values (8) 109
27.1%
ValueCountFrequency (%)
B 44
11.6%
2 40
10.5%
3 31
 
8.2%
1 31
 
8.2%
C 30
 
7.9%
5 24
 
6.3%
8 22
 
5.8%
6 22
 
5.8%
20
 
5.3%
D 20
 
5.3%
Other values (9) 96
25.3%

Embarked
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing10
Missing (%)0.2%0.0%
Memory size7.0 KiB7.0 KiB
S
309 
C
92 
Q
44 
S
334 
C
76 
Q
36 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters445446
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowSS
2nd rowSS
3rd rowSS
4th rowSS
5th rowSS

Common Values

ValueCountFrequency (%)
S 309
69.3%
C 92
 
20.6%
Q 44
 
9.9%
(Missing) 1
 
0.2%
ValueCountFrequency (%)
S 334
74.9%
C 76
 
17.0%
Q 36
 
8.1%

Length

2024-05-07T15:12:20.733249image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-05-07T15:12:20.877824image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T15:12:21.025903image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
ValueCountFrequency (%)
s 309
69.4%
c 92
 
20.7%
q 44
 
9.9%
ValueCountFrequency (%)
s 334
74.9%
c 76
 
17.0%
q 36
 
8.1%

Most occurring characters

ValueCountFrequency (%)
S 309
69.4%
C 92
 
20.7%
Q 44
 
9.9%
ValueCountFrequency (%)
S 334
74.9%
C 76
 
17.0%
Q 36
 
8.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 445
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
S 309
69.4%
C 92
 
20.7%
Q 44
 
9.9%
ValueCountFrequency (%)
S 334
74.9%
C 76
 
17.0%
Q 36
 
8.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 445
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
S 309
69.4%
C 92
 
20.7%
Q 44
 
9.9%
ValueCountFrequency (%)
S 334
74.9%
C 76
 
17.0%
Q 36
 
8.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 445
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
S 309
69.4%
C 92
 
20.7%
Q 44
 
9.9%
ValueCountFrequency (%)
S 334
74.9%
C 76
 
17.0%
Q 36
 
8.1%

Interactions

Dataset A

2024-05-07T15:12:07.347490image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T15:12:11.328448image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T15:12:04.759722image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T15:12:08.675745image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T15:12:05.380384image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T15:12:09.289715image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T15:12:06.006962image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T15:12:09.927408image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T15:12:06.717126image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T15:12:10.697799image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T15:12:07.461318image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T15:12:11.444256image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T15:12:04.876668image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T15:12:08.789237image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T15:12:05.499850image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T15:12:09.409379image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T15:12:06.102750image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T15:12:10.163919image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T15:12:06.833219image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T15:12:10.814805image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T15:12:07.588165image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T15:12:11.580313image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T15:12:05.004903image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T15:12:08.919822image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T15:12:05.631027image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T15:12:09.551099image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T15:12:06.315212image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T15:12:10.295679image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T15:12:06.963259image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T15:12:10.948834image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T15:12:07.723312image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T15:12:11.718699image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T15:12:05.138588image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T15:12:09.054906image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T15:12:05.755424image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T15:12:09.675107image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T15:12:06.458053image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T15:12:10.438468image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T15:12:07.098941image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T15:12:11.084938image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T15:12:07.849375image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T15:12:11.841965image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T15:12:05.263196image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T15:12:09.171058image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T15:12:05.883906image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T15:12:09.801622image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T15:12:06.588391image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T15:12:10.568415image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T15:12:07.225219image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T15:12:11.205468image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Missing values

Dataset A

2024-05-07T15:12:08.027545image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset B

2024-05-07T15:12:12.021154image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset A

2024-05-07T15:12:08.288732image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset B

2024-05-07T15:12:12.280918image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset A

2024-05-07T15:12:08.460953image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Dataset B

2024-05-07T15:12:12.440289image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
16616711Chibnall, Mrs. (Edith Martha Bowerman)femaleNaN0111350555.0000E33S
40240303Jussila, Miss. Mari Ainafemale21.01041379.8250NaNS
71571603Soholt, Mr. Peter Andreas Lauritz Andersenmale19.0003481247.6500F G73S
53553612Hart, Miss. Eva Miriamfemale7.002F.C.C. 1352926.2500NaNS
45145203Hagland, Mr. Ingvald Olai OlsenmaleNaN106530319.9667NaNS
77877903Kilgannon, Mr. Thomas JmaleNaN00368657.7375NaNQ
919203Andreasson, Mr. Paul Edvinmale20.0003474667.8542NaNS
49849901Allison, Mrs. Hudson J C (Bessie Waldo Daniels)female25.012113781151.5500C22 C26S
50850903Olsen, Mr. Henry Margidomale28.000C 400122.5250NaNS
81581601Fry, Mr. RichardmaleNaN001120580.0000B102S

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
82282301Reuchlin, Jonkheer. John Georgemale38.000199720.0000NaNS
41341402Cunningham, Mr. Alfred FlemingmaleNaN002398530.0000NaNS
83883913Chip, Mr. Changmale32.000160156.4958NaNS
13113203Coelho, Mr. Domingos Fernandeomale20.000SOTON/O.Q. 31013077.0500NaNS
83783803Sirota, Mr. MauricemaleNaN003920928.0500NaNS
29829911Saalfeld, Mr. AdolphemaleNaN001998830.5000C106S
29729801Allison, Miss. Helen Lorainefemale2.012113781151.5500C22 C26S
36937011Aubart, Mme. Leontine Paulinefemale24.000PC 1747769.3000B35C
17217313Johnson, Miss. Eleanor Ileenfemale1.01134774211.1333NaNS
394013Nicola-Yarred, Miss. Jamilafemale14.010265111.2417NaNC

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
31431502Hart, Mr. Benjaminmale43.011F.C.C. 1352926.2500NaNS
28328413Dorking, Mr. Edward Arthurmale19.000A/5. 104828.0500NaNS
49249301Molson, Mr. Harry Marklandmale55.00011378730.5000C30S
868703Ford, Mr. William Nealmale16.013W./C. 660834.3750NaNS
70971013Moubarek, Master. Halim Gonios ("William George")maleNaN11266115.2458NaNC
38138213Nakid, Miss. Maria ("Mary")female1.002265315.7417NaNC
84084103Alhomaki, Mr. Ilmari Rudolfmale20.000SOTON/O2 31012877.9250NaNS
18618713O'Brien, Mrs. Thomas (Johanna "Hannah" Godfrey)femaleNaN1037036515.5000NaNQ
41241311Minahan, Miss. Daisy Efemale33.0101992890.0000C78Q
28028103Duane, Mr. Frankmale65.0003364397.7500NaNQ

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
67767813Turja, Miss. Anna Sofiafemale18.00041389.8417NaNS
21021103Ali, Mr. Ahmedmale24.000SOTON/O.Q. 31013117.0500NaNS
19619703Mernagh, Mr. RobertmaleNaN003687037.7500NaNQ
36036103Skoog, Mr. Wilhelmmale40.01434708827.9000NaNS
61761803Lobb, Mrs. William Arthur (Cordelia K Stanlick)female26.010A/5. 333616.1000NaNS
87287301Carlsson, Mr. Frans Olofmale33.0006955.0000B51 B53 B55S
11711802Turpin, Mr. William John Robertmale29.0101166821.0000NaNS
84084103Alhomaki, Mr. Ilmari Rudolfmale20.000SOTON/O2 31012877.9250NaNS
77777813Emanuel, Miss. Virginia Ethelfemale5.00036451612.4750NaNS
16516613Goldsmith, Master. Frank John William "Frankie"male9.00236329120.5250NaNS

Duplicate rows

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.